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Abstract — This  paper  proposes  new  architectures  for  nu¬ 
meric  function  generators  (NFGs)  using  piecewise  arithmetic 
expressions.  The  proposed  architectures  are  programmable, 
and  they  realize  a  wide  range  of  numeric  functions.  To  design 
an  NFG  for  a  given  function,  we  partition  the  domain  of 
the  function  into  uniform  segments,  and  transform  a  sub¬ 
function  in  each  segment  into  an  arithmetic  spectrum.  From 
this  arithmetic  spectrum,  we  derive  an  arithmetic  expression, 
and  realize  the  arithmetic  expression  with  hardware.  Since  the 
arithmetic  spectrum  has  many  zero  coefficients  and  repeated 
coefficients,  by  storing  only  distinct  nonzero  coefficients  in  a 
table,  we  can  significantly  reduce  the  table  size  needed  to 
store  arithmetic  coefficients.  Experimental  results  show  that 
the  table  size  can  be  reduced  to  only  a  small  percent  of  the 
table  size  needed  to  store  all  the  arithmetic  coefficients.  We  also 
propose  techniques  to  reduce  table  size  further  and  to  improve 
performance. 

Key  words -numeric  function  generators;  piecewise  arithmetic 
expressions;  nonzero  arithmetic  coefficients;  programmable 
architectures. 

I.  Introduction 

Numeric  functions,  such  as  trigonometric,  logarithmic, 
square  root,  and  combinations  of  these  functions,  are  widely 
used  in  computer  graphics,  digital  signal  processing,  com¬ 
munication  systems,  robotics,  etc.  [5].  In  these  applications, 
as  well  as  addition  and  multiplication,  numeric  functions  are 
usually  used  as  a  basic  operation.  Particularly,  in  graphics 
applications,  about  half  of  the  total  processing  time  is  used 
to  compute  numeric  functions  [12].  Thus,  for  numerically 
intensive  or  real-time  applications,  hardware  accelerators, 
called  numeric  function  generators  (NFGs),  are  often  re¬ 
quired.  The  computation  of  numeric  functions  has  been 
studied  for  more  than  150  years  [21],  and  various  NFGs 
have  been  proposed  [2],  [4],  [13],  [16],  [17].  Many  existing 
NFGs  are  based  on  polynomial  approximations. 

For  design  and  verification  of  arithmetic  circuits  such 
as  adders  and  multipliers,  the  arithmetic  transform  is  often 
used  due  to  its  compactness  [1],  [3],  [14],  [20],  [22], 
However,  for  the  design  of  NFGs,  it  is  rarely  used.  Only 
a  few  studies  on  NFGs  using  the  arithmetic  transform  have 
been  reported  [15],  [19].  However,  in  both  papers,  different 
architectures  are  required  for  different  numeric  functions. 


Although  a  dedicated  NFG  for  a  specific  numeric  function 
is  fast,  many  NFGs  have  to  be  designed  for  a  wide  range 
of  numeric  functions.  Since  this  consumes  chip  area  and 
accounts  for  much  of  the  design  and  production  costs,  a 
programmable  NFG,  which  can  compute  various  numeric 
functions  at  high-speed  with  a  single  architecture,  is  re¬ 
quired,  along  with  a  systematic  design  method.  To  satisfy 
this  requirement,  this  paper  proposes  new  architectures  and 
a  design  method  for  programmable  NFGs  using  the  arith¬ 
metic  transform.  In  [15],  [19],  the  arithmetic  transform  is 
applied  to  the  whole  of  a  numeric  function.  However,  this  is 
unsuitable  for  design  of  programmable  NFGs  because  they 
require  too  many  additions.  To  design  an  efficient  NFG,  we 
uniformly  partition  the  domain  of  a  given  numeric  function 
into  segments,  and  apply  the  arithmetic  transform  to  a  sub¬ 
function  for  each  segment.  From  the  arithmetic  spectrum  ob¬ 
tained  by  the  transform,  we  derive  an  arithmetic  expression, 
and  realize  the  arithmetic  expression  with  memories  and  an 
accumulator.  By  changing  the  memory  data,  we  can  realize 
a  wide  range  of  numeric  functions  with  a  single  architecture. 

This  paper  is  organized  as  follows:  Section  II  introduces 
a  numeric  representation  of  a  real  numeric  function,  and 
the  arithmetic  transform.  Section  III  presents  piecewise 
arithmetic  expressions,  and  architectures  for  NFGs  based  on 
them.  Experimental  results  are  shown  in  Section  IV.  And, 
Section  V  presents  techniques  to  reduce  memory  size  and 
to  improve  the  performance  of  NFGs. 

II.  Preliminaries 

A.  Number  Representation 

This  subsection  defines  a  number  representation  and  de¬ 
scribes  how  to  convert  real  functions  into  integer  functions. 

Definition  1:  Let  B  =  {0, 1},  Z  be  the  set  of  the  integers, 
and  R  be  the  set  of  the  real  numbers.  An  /; -input  m-output 
logic  function  is  a  mapping:  B"  —  Bm,  a  (binary-input) 
integer  function  is  a  mapping:  B"  — »  Z,  and  a  real  function 
is  a  mapping:  R.  — >  R. 

Definition  2:  A  value  X  represented  by  the  binary  fixed- 
point  representation  is  denoted  by 

X  -  [xn_int—\  xn_inl  —  2  •••  V I  To •  x  I  V  2  •••  x  n fracjl: 


Report  Documentation  Page 


Form  Approved 
OMB  No.  0704-0188 


Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 
VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 


1.  REPORT  DATE 

MAY  2011 


2.  REPORT  TYPE 


3.  DATES  COVERED 


5a.  CONTRACT  NUMBER 


5b.  GRANT  NUMBER 


5c.  PROGRAM  ELEMENT  NUMBER 


5d.  PROJECT  NUMBER 


5e.  TASK  NUMBER 


5f.  WORK  UNIT  NUMBER 


4.  TITLE  AND  SUBTITLE 

Numeric  Function  Generators  Using  Piecewise  Arithmetic  Expressions 


6.  AUTHOR(S) 


7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES)  8.  PERFORMING  ORGANIZATION 

Naval  Postgraduate  School, Department  of  Electrical  and  Computer  report  number 

Engineering, Monterey, CA, 93943 

9.  SPONSORING/MONITORING  AGENCY  NAME(S )  AND  ADDRESS(ES )  10.  SPONSOR/MONITOR' S  ACRONYM(S) 

11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited. 

13.  SUPPLEMENTARY  NOTES 

14.  ABSTRACT 

This  paper  proposes  new  architectures  for  numeric  function  generators  (NFGs)  using  piecewise  arithmetic 
expressions.  The  proposed  architectures  are  programmable  and  they  realize  a  wide  range  of  numeric 
functions.  To  design  an  NFG  for  a  given  function,  we  partition  the  domain  of  the  function  into  uniform 
segments,  and  transform  a  subfunction  in  each  segment  into  an  arithmetic  spectrum.  From  this  arithmetic 
spectrum,  we  derive  an  arithmetic  expression  and  realize  the  arithmetic  expression  with  hardware.  Since 
the  arithmetic  spectrum  has  many  zero  coefficients  and  repeated  coefficients,  by  storing  only  distinct 
nonzero  coefficients  in  a  table,  we  can  significantly  reduce  the  table  size  needed  to  store  arithmetic 
coefficients.  Experimental  results  show  that  the  table  size  can  be  reduced  to  only  a  small  percent  of  the 
table  size  needed  to  store  all  the  arithmetic  coefficients.  We  also  propose  techniques  to  reduce  table  size 
further  and  to  improve  performance. 


15.  SUBJECT  TERMS 


16.  SECURITY  CLASSIFICATION  OF: 


a.  REPORT 

unclassified 


b.  ABSTRACT 

unclassified 


c.  THIS  PAGE 

unclassified 


17.  LIMITATION  OF 

18.  NUMBER 

ABSTRACT 

OF  PAGES 

6 

19a.  NAME  OF 
RESPONSIBLE  PERSON 


Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


Table  I 

Function  table  for  3-bit  sin(X'). 

(a)  Table  for  sin(X).  (b)  Truth  table  for  fb{X ).  (c)  Table  for  f(X). 


X 

sin(X) 

0.000 

0.000 

0.125 

0.125 

0.250 

0.247 

0.375 

0.366 

0.500 

0.479 

0.625 

0.585 

0.750 

0.682 

0.875 

0.768 

X 

m 

000 

0 

001 

1 

010 

2 

Oil 

3 

100 

4 

101 

5 

110 

5 

111 

6 

X 

MX) 

0.000 

0.000 

0.001 

0.001 

0.010 

0.010 

0.011 

0.011 

0.100 

0.100 

0.101 

0.101 

0.110 

0.101 

0.111 

0.110 

where  Xj  £  {0, 1}  for  —n_frac  <  i  <  n_int—  1,  n_int  is  the 
number  of  bits  for  the  integer  part,  and  n_frac  is  the  number 
of  bits  for  the  fractional  part  of  X.  We  call 

n_int—  1 

*  =  X  2‘x> 

i=—n _ frac 

an  n-bit  fixed-point  representation  in  which  n  bits  are  used 
to  represent  the  value,  where  n  =  n_int  +  n_frac.  In  this 
paper,  an  n-bit  function  f(X )  means  that  the  input  variable 
X  has  n  bits. 

We  can  convert  a  real  function  in  fixed-point  represen¬ 
tation  to  an  /; -input  m-output  logic  function.  The  logic 
function,  in  turn,  can  be  converted  into  an  integer  function  by 
considering  binary  vectors  as  integers.  That  is,  we  can  con¬ 
vert  a  real  function  into  an  integer  function:  B"  — >  P,n,  where 
Pm  =  {0, 1, . . .  ,2"'  —  1}.  In  this  paper,  numeric  functions 
are  converted  into  integer  functions  by  using  a  fixed-point 
representation,  unless  stated  otherwise.  And,  for  simplicity, 
each  bit  in  the  fixed-point  representation  of  X  is  denoted  by 
xf,  A'o  is  the  least  significant  bit. 

Example  1:  Table  I  (a)  shows  values  of  sin(X)  for  eight 
values  of  X.  Using  a  3-bit  fixed-point  representation,  this 
function  is  converted  into  the  logic  function  fi,(X)  in  Ta¬ 
ble  I  (b).  By  representing  the  output  vectors  as  integers, 
we  have  the  integer  function  f(X)  in  Table  I  (c).  In  this 
paper,  the  3-bit  sin  (A)  denotes  the  integer  function  f(X)  in 
Table  I  (c).  (End  of  Example) 

B.  Arithmetic  Transform 

This  subsection  introduces  the  arithmetic  transform,  the 
arithmetic  spectrum,  and  the  arithmetic  expression  [18]. 

First,  define  a  matrix  operation  and  some  notation. 

Definition  3:  Let  A  be  an  in  x  n)  square  matrix,  where 


a  11 

012  • 

d\n 

«21 

022  • 

Cl2n 

a>i  i 

an2 

Q-nn 

Let  B  be  an  (n  x  n )  square  matrix.  Then,  the  Kronecker 
product  of  A  and  B  is  the  in2  x  n2)  matrix: 


a\\B 

anB  . 

Cl\nB 

ai\B 

022  B  . 

Cl2  nB 

an\B 

aniB 

dnnB 

Definition  4:  Given  a  matrix  M,  the  transposed  ma¬ 
trix  M'  is  obtained  by  interchanging  rows  and  columns 
of  M.  For  an  n-bit  integer  function  f(X),  the  function- 
vector  F  is  the  column  vector  of  the  function  values  F 

=  [/(oo. .  .0),/(00. .  .01), . . .  ,/(ll . . .  1)]'. 

We  define  the  arithmetic  transform  and  the  arithmetic 
spectrum  as  follows: 

Definition  5:  The  arithmetic  transform  matrix  is 

n 

fL(n)  =  (££)  ft(l) ,  where  JA(1)  = 

i=i 

such  that  addition  and  multiplication  are  done  in  integer 
arithmetic.  For  an  integer  function  /  given  by  the  function- 

vector  F,  the  arithmetic  spectrum  fit/  =  [ao,a\ _ -«2« -  i]' 

is 

flf  =  A(n)F. 


1  0 

-1  1  ’ 


Each  a-,  in  the  spectrum  is  called  an  arithmetic  coefficient. 

Example  2:  Consider  the  1-bit  adder  function  f(x\  ,xf)  = 
x i  +X2-  The  function-vector  is  F=  [0,1,1, 2]r.  The  arithmetic 
spectrum  is 


Af  =  JA(2)F  = 


10  0  0 

-1  10  0 

-10  10 
1-1-11 


'  o  " 

'  0  " 

i 

1 

i 

1 

2 
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(End  of  Example) 

Similarly,  we  define  the  inverse  arithmetic  transform  as 
follows: 

Definition  6:  Let  El  1  (n)  be  the  inverse  arithmetic 
transform  matrix  defined  by 


xr\n)  =  (&nr\  1),  ^-1(1)=  [  J  ° 

1=1  L 


Definition  7:  In  a  symbolic  representation. 


=  [  1  Xi  '  . 


Therefore,  the  inverse  arithmetic  transform  is  defined  as 

n 

f  =  XaAf,  Xfl  =  0[  1  Xf]. 

i=  1 


Example  3:  By  the  inverse  arithmetic  transform  from  the 
arithmetic  spectrum  obtained  in  Example  2,  the  integer 
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Figure  1.  3-bit  programmable  NFG  based  on  the  arithmetic  expression. 


m 

Figure  2.  Programmable  NFG  based  on  the  piecewise  arithmetic  expres¬ 
sion. 


function  /  is  represented  as  follows: 


/  =  XaFLf  =  [  1  X2  Xl  X\X2 


=  XI+X2- 


0 

1 

1 

0 


(End  of  Example) 

From  Definitions  6  and  7,  we  can  see  that  an  integer 
function  f(X)  can  be  represented  by  the  arithmetic  spectrum 
and  the  inverse  arithmetic  transform.  That  is, 

Lemma  1:  Using  J3l_1(l)  and  FL(  1),  an 
/  is  represented  as  follows: 

/  =  A~\l)A{l)F  =  {  1  Xi]  _\ 

fo 

fi-fo 

=  fo  +  Xiifi-fo),  (1) 

where  fo  =  f(x,  =  0),  f\  =  /(x,-  =  1).  (1)  is  the  arithmetic 
transform  expansion  (also  called  A-expansion  or  moment 
decomposition  [1]).  The  arithmetic  expression  for  f  is 
obtained  by  the  arithmetic  transform  expansion.  The  arith¬ 
metic  coefficients  correspond  to  coefficients  of  the  arithmetic 
expression  for  /. 

III.  NFGs  Based  on  Piecewise  Arithmetic 
Expressions 

This  section  introduces  piecewise  arithmetic  expressions, 
and  presents  programmable  architectures  for  NFGs  based  on 
them. 


=  1 


integer  function 


fo 

h 


A.  Piecewise  Arithmetic  Expressions 

Since  a  numeric  function  can  be  converted  into  an  inte¬ 
ger  function  using  the  fixed-point  representation,  it  can  be 


represented  by  the  arithmetic  expression: 

ao  +  a\xo  +  a2X\  +  <23X1X0  +  . . .  +  a2”-ixn-ixn-2 . .  .xo- 

The  arithmetic  expression  can  be  realized  with  only  AND 
gates  and  adders,  and  thus,  it  is  realized  with  a  compact 
circuit  when  many  arithmetic  coefficients  a,  are  zero.  Since 
many  elementary  functions,  such  as  sin(x)  and  log(x),  have 
many  zero  arithmetic  coefficients,  we  can  design  compact 
NFGs  for  them  [15],  [19].  However,  fixed-point  represen¬ 
tations  with  many  bits  necessarily  produce  the  arithmetic 
expressions  with  too  many  product  terms  resulting  in  large 
and  slow  NFGs.  In  addition,  a  straightforward  programmable 
implementation  of  the  NFGs  proposed  in  [15],  [19],  as 
shown  in  Fig.  1,  needs  too  many  adders  (2"—  1  adders). 

To  reduce  the  number  of  product  terms  (adders),  we 
transform  sub-functions  into  a  set  of  the  arithmetic  spectra, 
instead  of  transforming  the  whole  domain  of  a  function  into 
the  single  arithmetic  spectrum,  and  represent  the  function 
using  a  set  of  the  arithmetic  expressions.  Then,  we  design  a 
programmable  NFG  using  the  set  of  the  arithmetic  expres¬ 
sions. 

To  produce  a  set  of  the  arithmetic  expressions,  we  parti¬ 
tion  the  domain  of  a  given  numeric  function  into  uniform 
segments,  and  apply  the  arithmetic  transform  to  a  sub¬ 
function  for  each  segment.  Hence,  we  call  the  set  of  arith¬ 
metic  expressions  a  piecewise  arithmetic  expression.  Note 
that,  in  the  piecewise  arithmetic  expression,  we  partition  the 
domain  into  segments  using  the  most  significant  bits  (MSBs) 
of  X. 

B.  Architectures  for  Programmable  NFGs 

By  realizing  a  set  of  the  arithmetic  spectra  for  the  piece- 
wise  arithmetic  expression  with  a  memory  (called  arithmetic 
coefficients  table)  we  obtain  the  NFG  in  Fig.  2.  The  MSBs 
of  X  select  a  segment  (an  arithmetic  spectrum),  and  then  an 


(a)  Overall  architecture  (b)  Generalized  product  term 

Figure  3.  Programmable  NFG  based  on  a  sequential  computation. 


f(X) 

Figure  4.  Improved  architecture  for  programmable  NFG. 

arithmetic  expression  is  computed  using  the  least  significant 
bits  (LSBs)  of  X.  This  NFG  requires  2k  —  1  adders,  where 
k  is  the  number  of  the  LSBs.  Thus,  it  is  more  compact 
and  faster  than  the  NFG  based  on  the  single  arithmetic 
expression  in  Fig.  1,  in  which  2"  —  1  adders  are  required. 

Unfortunately,  the  number  of  adders  is  still  large,  and 
this  design  is  inefficient.  Since  the  arithmetic  spectra  usually 
have  many  zero  coefficients,  the  arithmetic  coefficients  table 
is  sparse,  and  many  unnecessary  additions  are  performed.  To 
perform  only  necessary  additions,  and  to  reduce  the  number 
of  adders,  we  propose  the  architecture  shown  in  Fig.  3. 
Note  that  for  readability  of  the  figures,  the  enable  signal 
(an  external  input)  to  start  the  computation  (reset  registers), 
the  done  signal  (an  external  output)  to  denote  finish  of  the 
computation,  and  multiplexers  are  omitted  from  Fig.  3. 

In  this  architecture,  only  the  nonzero  arithmetic  coeffi¬ 
cients  are  stored  in  a  table.  By  reading  out  each  coefficient 
sequentially,  it  computes  the  arithmetic  expression  using  an 


accumulator.  Each  product  term  is  computed  with  the  circuit 
in  Fig.  3(b)  using  a  don’t  care  bit  pattern.  In  the  don’t  care 
bit  pattern,  bits  corresponding  to  input  variables  that  do  not 
appear  in  a  product  term  are  set  to  1.  For  example,  for  a  5-bit 
function,  the  bit  pattern  for  the  product  term  xvkq  is  10110. 
The  start  address  table  stores  a  start  address  of  the  nonzero 
arithmetic  coefficients  table  and  the  don’t  care  bits  table 
for  each  segment.  In  this  architecture,  the  evaluation  time 
of  an  arithmetic  expression  is  proportional  to  the  number 
of  nonzero  arithmetic  coefficients.  Thus,  a  numeric  function 
that  has  many  zero  arithmetic  coefficients  can  be  computed 
at  high  speed. 

To  further  reduce  the  number  of  arithmetic  coefficients  to 
be  stored  in  a  table,  we  omit  repeated  coefficients  in  a  table. 
Fig.  4  shows  the  improved  architecture.  By  using  pointers  to 
the  distinct  arithmetic  coefficients  instead  of  directly  storing 
the  coefficients,  we  can  significantly  reduce  the  bit  width  of 
the  table  if  the  number  of  distinct  coefficients  is  small. 

IV.  Experimental  Results 

To  show  the  efficiency  of  the  proposed  NFGs,  we  compare 
the  table  size  and  the  number  of  additions  for  the  three 
proposed  NFGs.  Table  II  shows  the  experimental  results.  In 
this  table,  the  column  “No.  of  additions’’  denotes  the  number 
of  additions  needed  to  compute  an  arithmetic  expression  for 
each  segment.  Since,  in  the  NFGs  in  Figs.  3  and  4,  the 
number  of  product  terms  for  different  arithmetic  expressions 
are  different,  the  average  number  of  additions  for  each 
expression  is  shown. 

Since  the  programmable  NFG  based  on  the  single  arith¬ 
metic  expression  in  Fig.  1  requires  216  =  65,536  registers 
and  216  —  1  =  65,535  adders,  the  proposed  NFGs  based  on 
the  piecewise  arithmetic  expression  require  several  orders 
of  magnitude  fewer  adders,  and  much  less  storage  size.  As 
shown  in  Table  II,  for  many  numeric  functions,  the  number 
of  nonzero  arithmetic  coefficients  and  the  number  of  distinct 
arithmetic  coefficients  are  small.  Thus,  by  storing  only  these, 
we  significantly  reduce  size  of  the  arithmetic  coefficients 
table,  resulting  in  a  reduction  of  total  size  of  tables. 

V.  Improvement  Techniques  for  NFGs 
A.  Piecewise  Polynomial  Approximation 

By  using  a  polynomial  approximation,  we  can  reduce  the 
number  of  nonzero  arithmetic  coefficients,  and  thus,  the  table 
size  and  the  number  of  additions  can  be  further  reduced.  This 
is  based  on  the  following  lemma: 

Lemma  2:  [8]  For  an  n-bit  /uh-degree  polynomial  func¬ 

tion  f(X)  =  Ci(Xk  +  c>_  |  Xk~  1  +  . . .  +  co,  the  number  of 
nonzero  arithmetic  coefficients  is  at  most 


Table  II 

Table  size  and  the  number  of  additions  for  16-bit  NFGs. 


Functions 

m 

No.  of  stored  coefficients 

Size  of  coefficients  table 

Total  size  of  tables 

No.  of  additions 

NFG1 

(all) 

NFG2 

(nonzero) 

NFG3 

(distinct) 

NFGl 

(bits) 

NFG2 

(bits) 

NEG3 

(bits) 

NFGl 

(bits) 

NFG2 

(bits) 

NFG3 

(bits) 

NFGl 

NFG2,3 

(average) 

2A 

65,536 

35,524 

445 

1,048,576 

568,384 

7,120 

1,048,576 

892,196 

651,093 

255 

138.8 

65,536 

36,590 

587 

1,048,576 

585,440 

9,392 

1,048,576 

918,846 

709,872 

255 

142.9 

ln(X+l) 

65,536 

36,914 

402 

1,048,576 

590,624 

6,432 

1,048,576 

926,946 

674,980 

255 

144.2 

log2(X  +  l) 

65,536 

35,069 

451 

1,048,576 

561,104 

7,216 

1,048,576 

880,821 

642,554 

255 

137.0 

!/(*  +  !) 

65,536 

37,702 

401 

1,048,576 

603,232 

6,416 

1,048,576 

946,646 

689,549 

255 

147.3 

Vx  +  i 

65,536 

37,316 

323 

1,048,576 

597,056 

5,168 

1,048,576 

936,996 

681,275 

255 

145.8 

sin  (A-) 

65,536 

31,327 

391 

1,048,576 

501,232 

6,256 

1,048,576 

787,015 

573,982 

255 

122.4 

tan(X) 

65,536 

33,397 

548 

1,048,576 

534,352 

8,768 

1,048,576 

839,021 

647,955 

255 

130.5 

sin_1(X) 

65,536 

32,059 

532 

1,048,576 

512,944 

8,512 

1,048,576 

805,315 

622.005 

255 

125.2 

tarC1  (X) 

65,536 

32,463 

401 

1,048,576 

519,408 

6,416 

1,048,576 

815,415 

594,590 

255 

126.8 

NFG1:  the  NFG  shown  in  Fig.  2. 


NFG2:  the  NFG  shown  in  Fig.  3. 


NFG3:  the  NFG  shown  in  Fig.  4. 


No.  of  additions:  the  number  of  additions  needed  to  compute  each  arithmetic  expression. 


The  number  of  MSBs  for  uniform  segmentation  is  8. 


The  domain  of  all  functions  is  0  <  X  <  1 . 


We  approximate  a  given  numeric  function  using  a  piece- 
wise  polynomial  within  a  desired  error,  and  then  transform 
the  polynomial  into  an  arithmetic  expression  in  each  seg¬ 
ment.  The  piecewise  arithmetic  expression  obtained  in  this 
way  is  realized  with  a  compact  NFGs. 

Example  4:  Consider  a  piecewise  quadratic  polynomial 
approximation  of  a  16-bit  numeric  function  using  256 
uniform  segments.  Then,  a  polynomial  in  each  segment 
has  8  bits.  Thus,  the  total  number  of  nonzero  arithmetic 
coefficients  is  at  most 


and  the  number  of  additions  is  only  36.  (End  of  Example) 

In  this  way,  by  using  piecewise  polynomial  approxima¬ 
tion,  more  compact  and  faster  programmable  NFGs  based 
on  the  piecewise  arithmetic  expression  can  be  produced. 

B.  Parallel  Computation 

The  proposed  NFGs  based  on  a  sequential  computation  in 
Figs.  3  and  4  produce  an  arithmetic  coefficient  one  by  one, 
and  compute  each  product  term  of  an  arithmetic  expression 
sequentially.  Thus,  they  require  O(N)  computation  time, 
and  are  obviously  slower  than  the  NFG  in  Fig.  2  which 
requires  <9(logiVj  computation  time,  where  N  is  the  number 
of  product  terms. 

On  the  other  hand,  the  NFG  in  Fig.  2  produces  all  arith¬ 
metic  coefficients  simultaneously,  and  adds  all  terms  of  an 
arithmetic  expression  at  once.  Thus,  it  is  faster  but  requires 
many  more  adders.  Note  that  the  adders  have  different  sizes. 
However,  in  an  FPGA  or  ASIC,  this  is  not  a  problem  because 
different  size  adders  can  be  easily  accommodated. 

These  two  designs  are  extreme  cases.  By  changing  the 
number  of  terms  to  be  computed  in  parallel,  we  can  explore 
the  design  space  taking  into  account  a  tradeoff  between  the 


number  of  adders  and  the  computation  time,  and  can  produce 
an  optimum  NFG  depending  on  applications. 

VI.  Conclusion  and  Comments 

This  paper  proposes  new  architectures  for  programmable 
NFGs  using  piecewise  arithmetic  expressions,  and  design 
methods  for  them.  We  also  propose  techniques  to  reduce 
table  size  and  to  improve  performance.  Experimental  results 
show  that  the  size  of  the  arithmetic  coefficients  table  can  be 
reduced  to  only  a  few  percent  of  the  table  size  needed  to 
store  all  the  arithmetic  coefficients.  By  using  the  proposed 
NFGs,  we  can  realize  a  wide  range  of  numeric  functions 
with  a  single  architecture,  and  we  can  switch  the  functions 
by  only  changing  the  contents  of  tables. 

In  this  paper,  we  used  about  one-half  of  input  bits  to 
partition  the  domain  of  a  function  into  uniform  segments. 
However,  there  could  be  an  optimum  number  of  bits  for 
each  function.  Thus,  we  will  study  optimum  segmentations 
as  future  work.  We  will  also  analyze  the  relation  between 
the  number  of  nonzero  arithmetic  coefficients  and  the  char¬ 
acteristic  of  functions. 
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